Aim:

The goal of this project is to:

  • Does demographics like Age, Sex, Location have impact on types of death, economic status?
  • Show the merit and demerit of visual analytics on data analysis, and how to improve it.
  • Make the visualized image(s) tells the story in an unambiguous way, understandable, even to a layman!

Methodology:

The project would lay more emphasis on the explanatory techniques. It will be used in making data presentation to the viewers in a more succinct way. I therefore plan to use the R programing language to explore and analysis the dataset.

The dataset to be used is the World Health Nutrition and Population Statistics from year 2010 to 2016 . This can be obtained from http://databank.worldbank.org/data/reports.aspx?source=health-nutrition-and-population-statistics#advancedDownloadOptions .

Load this libraries and dataset and lets get to work!

suppressMessages(library(knitr))
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
suppressMessages(library(plotly))
suppressMessages(library(sqldf))
suppressPackageStartupMessages(library(googleVis))
## Creating a generic function for 'toJSON' from package 'jsonlite' in package 'googleVis'
df <- read.csv("World_Health_2.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)
kable(head(df[200:206, ]))
Series_Name Series_Code Country_Name Country_Code YR1960 YR1970 YR1980 YR1990 YR2000 YR2010 YR2015
200 Children (ages 0-14) newly infected with HIV SH.HIV.INCD.14 Saudi Arabia SAU .. .. .. 100 100 100 100
201 Children (ages 0-14) newly infected with HIV SH.HIV.INCD.14 Senegal SEN .. .. .. 200 1000 1000 500
202 Children (ages 0-14) newly infected with HIV SH.HIV.INCD.14 Serbia SRB .. .. .. .. .. .. ..
203 Children (ages 0-14) newly infected with HIV SH.HIV.INCD.14 Seychelles SYC .. .. .. .. .. .. ..
204 Children (ages 0-14) newly infected with HIV SH.HIV.INCD.14 Sierra Leone SLE .. .. .. 200 500 1000 500
205 Children (ages 0-14) newly infected with HIV SH.HIV.INCD.14 Singapore SGP .. .. .. .. .. .. ..
  • World longitudes and latitudes
lat_long <- read.csv("Countries_long_lat2.csv", header = TRUE, sep = ",")
colnames(lat_long) <- c("Country", "Country_Code", "Latitude", "Longtitude")
kable(head(lat_long))
Country Country_Code Latitude Longtitude
Albania ALB 41.0000 20.0000
Algeria DZA 28.0000 3.0000
American Samoa ASM -14.3333 -170.0000
Andorra AND 42.5000 1.6000
Angola AGO -12.5000 18.5000
Anguilla AIA 18.2500 -63.1667

Cleaning and renaming of dataset and column respectively.

options(warn = -1)
df2 <- merge(df, lat_long, by.x = "Country_Code", by.y = "Country_Code", all = FALSE)
df2[, 5:11] <- sapply(df2[, 5:11], as.numeric)

Merging column lonitude and Latitude together for a better coordinate to be in maps (googlevis)

df2$Lat_Long = paste(df2$Latitude, df2$Longtitude, sep=":")

we are now to goint make use of sql to subset(query) columns so as to diffentiate between year 2000 and 2010 where the number children orphaned by HIV/AIDS more than 50000.

sq <- sqldf("select Lat_Long, Country_Name, YR2000, YR2010 from df2 
            where YR2010 >= 50000 and YR2000 >= 50000 and Series_Name ='Children orphaned by HIV/AIDS'
            order by YR2010, YR2000 limit 20")
head(sq)
##      Lat_Long Country_Name YR2000 YR2010
## 1     -10:-76         Peru  61000  54000
## 2       -1:15  Congo, Rep.  52000  70000
## 3     23:-102       Mexico  66000  71000
## 4       17:-4         Mali  51000  91000
## 5   2.5:112.5     Malaysia  51000  96000
## 6 19:-72.4167        Haiti 110000 100000
  • The world map showing the countries where children are orphaned by HIV?AIDS (2000, 2010)
Show_map <- gvisMap(sq, "Lat_Long" , "Country_Name",
              options=list(showTip=TRUE, mapType='normal',
              enableScrollWheel=TRUE,
              icons=paste0("{",
              "'default': {'normal': 'http://icons.iconarchive.com/",
              "icons/icons-land/vista-map-markers/48/",
              "Map-Marker-Ball-Azure-icon.png',\n",
              "'selected': 'http://icons.iconarchive.com/",
              "icons/icons-land/vista-map-markers/48/",
              "Map-Marker-Ball-Right-Azure-icon.png'",
              "}}")))
                        
plot(Show_map)
## starting httpd help server ... done
Note that this map was saved to my local file. For an interactive, kindly run the code above

Note that this map was saved to my local file. For an interactive, kindly run the code above

g <- ggplot(sq, aes(YR2010, YR2000)) +
  geom_line(aes(colour = Country_Name)) +
  labs(title = "Top 20 Countries Where Children Orphaned by HIV/AIDS For YRS 2000 & 2010 ") +
  geom_smooth(se = TRUE)

ggplotly(g)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Tools To Be Used:

  • GoogleVis
  • Plotly
  • GGplot2
  • Sqldf

Packages To Be Used:

  • Plotly
  • Knitr
  • Dplyr
  • Plyr
  • Reshape2
  • Ggplot2
  • Graphics
  • Ggthemes
  • GoogleVis etc